Margin distribution based bagging pruning

نویسندگان

  • Zongxia Xie
  • Yong Xu
  • Qinghua Hu
  • Pengfei Zhu
چکیده

Bagging is a simple and effective technique for generating an ensemble of classifiers. It is found there are a lot of redundant base classifiers in the original Bagging. We design a pruning approach to bagging for improving its generalization power. The proposed technique introduces the margin distribution based classification loss as the optimization objective and minimizes the loss on training samples, regularization is introduced to control the size of ensembles. By this way, we can obtain a sparse weight vector of base classifiers. Then we rank the base classifiers with respect to their weights and combine the base classifiers with large weights. We call this technique MArgin Distribution base Bagging pruning (MAD-Bagging). Simple voting and weighted voting are tried to combine the outputs of selected base classifiers. The performance of this pruned ensemble is evaluated with several UCI benchmark tasks, where base classifiers are trained with SVM, CART, and the nearest neighbor (1NN) rule, respectively. The results show that margin distribution based CART pruned Bagging can significantly improve classification accuracies. However, SVM and 1NN pruned Bagging improve little compared with single classifiers. & 2012 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Pruning Methods for Ensemble Classifiers

Many researchers have shown that ensemble methods such as Boosting and Bagging improve the accuracy of classification. Boosting and Bagging perform well with unstable learning algorithms such as neural networks or decision trees. Pruning decision tree classifiers is intended to make trees simpler and more comprehensible and avoid over-fitting. However it is known that pruning individual classif...

متن کامل

Margin optimization based pruning for random forest

This article introduces a margin optimization based pruning algorithm which is able to reduce the ensemble size and improve the performance of a random forest. A key element of the proposed algorithm is that it directly takes into account the margin distribution of the random forest model on the training set. Four different metrics based on the margin distribution are used to evaluate the ensem...

متن کامل

Exploiting diversity for optimizing margin distribution in ensemble learning

Margin distribution is acknowledged as an important factor for improving the generalization performance of classifiers. In this paper, we propose a novel ensemble learning algorithm named Double Rotation Margin Forest (DRMF), that aims to improve the margin distribution of the combined system over the training set. We utilise random rotation to produce diverse base classifiers, and optimize the...

متن کامل

New Ordering-Based Pruning Metrics for Ensembles of Classifiers in Imbalanced Datasets

The task of classification with imbalanced datasets have attracted quite interest from researchers in the last years. The reason behind this fact is that many applications and real problems present this feature, causing standard learning algorithms not reaching the expected performance. Accordingly, many approaches have been designed to address this problem from different perspectives, i.e., da...

متن کامل

Actively Balanced Bagging for Imbalanced Data

Under-sampling extensions of bagging are currently the most accurate ensembles specialized for class imbalanced data. Nevertheless, since improvements of recognition of the minority class, in this type of ensembles, are usually associated with a decrease of recognition of majority classes, we introduce a new, two phase, ensemble called Actively Balanced Bagging. The proposal is to first learn a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 85  شماره 

صفحات  -

تاریخ انتشار 2012